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Abstract 



We propose three new discrete variational schemes that capture the conservative-dissipative 
structure of a generalized Kramers equation. The first two schemes are single-step minimiza- 
tion schemes while the third one combines a streaming and a minimization step. The cost 
functionals in the schemes are inspired by the rate functional in the Freidlin-Wentzell theory 
pH ■ of large deviations for the underlying stochastic system. We prove that all three schemes 

' converge to the solution of the generalized Kramers equation. 

a 
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G\ ■ 1 Introduction 
in 

^ ; 1.1 The Kramers equation 

In this paper wc discuss the variational structure of a generalized Kramers equation, 

d t p=-dw q p— + div p pV q V + "fdivppVpF + "fkTA p p, in R 2d x R+, (1) 



which is the Fokkcr-Planck or Forward Kolmogorov equation of the stochastic differential equation 



H 

dQ(t) = OQdt, (2a) 



dP(t) = -VV(Q(t))dt - j\7F{P(t))dt + yj2-ykT dW(t). (2b) 

The system (2) describes the movement of a particle at position Q and with momentum P under 
the influence of three forces. One force is the derivative — VV^ of a background potential V = V(Q), 
the second is a friction force — jVF(P), and the third is a stochastic perturbation generated by a 
Wiener process W. The parameter m > is the mass of the particle (so that the velocity is P/m), 
7 is a friction parameter, k is the Boltzmann constant, and T is the temperature of the noise. A 
common choice for F is F(P) = P 2 /2m, which results in a linear friction force. 

For a stochastic particle given by (2), p = p(t, q,p) characterizes the probability of finding the 
particle at time t at position q and with momentum p. Equation (1) characterizes the evolution 
of this probability density over time. The three deterministic drift terms in (2) lead to convection 
terms in (1), and the noise results in the final term in (1). We use the notation div g and similar 
to indicate that the differential operator acts only on one variable. 
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Both equations describe the behaviour of a Brownian particle with inertia [Bro28] , such as 
which is large enough to be distinguished from the molecules in the surrounding solvent, but 
small enough to show random behaviour arising from collisions with those same molecules. Both 
the friction force and the noise term arise from collisions with the solvent, and the parameter 7 
characterizes the intensity of these collisions. The parameter kT measures the mean kinetic energy 
of the solvent molecules, and therefore characterizes the magnitude of the collision noise. A major 
application of this system is as a simplified model for chemical reactions, and it is in this context 
that Kramers originally introduced it [Kra40]. 

The aim of this paper is to discuss variational formulations for equation (1). The theory of 
such variational structures took off with the introduction of Wasserstein gradient flows by [JK097, 
JK098] and of the energetic approach to rate-independent processes [MTL02, Mie05]. Both have 
changed the theory of evolution equations in many ways. If a given evolution equation has such a 
variational structure, then this property gives strong restrictions on the type of behaviour of such 
a system, provides general methods for proving well-posedness [AGS08] and characterizing large- 
time behaviour (e.g., [CMV03]), gives rise to natural numerical discretizations (e.g., [DMM10]), 
and creates handles for the analysis of singular limits (e.g., [SS04, Ste08, AMP+12]). Because of 
this wide range of tools, the study of variational structure has important consequences for the 
analysis of an evolution equation. 

Remark 1.1. A brief word about dimensions. We make the unusual choice of preserving the 
dimensional form of the equations, because the explicit constants help in identifying the modelling 
origin and roles of the different terms and effects, and these aspects are central to this paper. 
Therefore Q and q are expressed in m, P and p in kgm/s, m in kg, V, F, and kT in J, and 7 in 
kg/s. The density p has dimensions such that J p is dimensionless. This setup implies that the 
Wiener process has dimension 1/s, in accordance with the formal property dW 2 = dt. 



1.2 Variational evolution 



To avoid confusion between the Boltzmann constant and the integer fc, from now on we define 
/3 _1 := kT. The authors of [JK098] studied an equation that can be seen as a simpler, spatially 
homogeneous case of (1), where p = p(t,p): 

a t p = 7/3- 1 App + 7div p pVpF. (3) 

They showed that this equation is a gradient flow of the free energy 



MP) ■= [ IpF + P^plogp 



dp 



with respect to the Wasserstein metric. This statement can be made precise in a variety of different 
ways (see [AGS08] for a thorough treatment of this subject); for the purpose of this paper the most 
useful one is that the solution t <— > p(t,p) can be approximated by the time-discrete sequence pk 
defined recursively by 

p k G argmhiA'^p, p k -i), K h (p, p k -i) ■= —~d(p, Pk-i) 2 +A p (p). (4) 

Here d is the Wasserstein distance between two probability measures p$(x)dx and p(y)dy with 
finite second moment, 

d(po,p) 2 -= inf / \x-y\ 2 P(dxdy), 
Per{ Po ,p) J R d XIl d 

where T(po,p) is the set of all probability measures on R d x R d with marginals po and p, 

T(po,p) = {Pe T(R d xR d ) : P(AxR d ) = p (A), P(R d x A) = p(A) for all Borel subsets A c R d }. 

(5) 
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A consequence of this gradient- flow structure is that A p decreases along solutions of (3) . 

Unfortunately, a convincing generalization of this gradient-flow concept and corresponding 
theory to equations such as the Kramers equation is still lacking. This is related to the mixture 
of both dissipative and conservative effects in these equations, which we now explain. 



1.3 A combination of conservative and dissipative effects 



The full Kramers equation (f ) is a mixture of the dissipative behaviour described by (3) and 
a Hamiltonian, conservative behaviour. The conservative behaviour can be recognized by setting 
7 = 0, thus discarding the last two terms in (2); what remains in (2) is a deterministic Hamiltonian 
system with Hamiltonian energy H(q,p) = p 2 /2m+V(q). The evolution of this system is reversible 
and conserves H. Correspondingly, the evolution of (1) with 7 = also is reversible and conserves 
the expectation of H, 



H(p) 



R 2d 



p(<l,p)H(q,p) dqdp. 



On the other hand, as suggested by the discussion in the previous section, the 7-dependent 
terms represent dissipative effects. In the variational schemes that we define below, a central role 
is played by the (g,p)-dependent analogue of A p , 



A(p) 



R- 



p(q,p)F(p) + P 1 p{q,p) log p(g,p) 



dqdp. 



Because of the special structure of (1), the functional A does not decrease along solutions, but in 
the particular case F(p) := p 2 /2m, a 'total free energy' functional does: setting 



£{p) :=A(p)+ / pVdqdp 



H + (3 1 log p p dqdp 



we calculate that 

d t £(p(t)) 



-1 



R 



2d p(t,q,p) 



p(t,q,p)^-+l3- 1 V p p(t,q,p) 



dqdp < 0. 



(6) 



The choice F(p) = p 2 /2m is related to the fluctuation-dissipation theorem, and we comment on 
this in Section 1.7. 

Because of the conservative, Hamiltonian terms, equation (1) is not a gradient flow, and an 
approach such as [JK098] is not possible. In 2000 Huang [HuaOO] proposed a variational scheme 
that is inspired by [JK098], but modified to account for the conservative effects, and in this paper 
we describe three more variational schemes for the same equation. 



1.4 Huang's discrete schemes for the Kramers equation 



The time-discrete variational schemes of Huang's and of this paper are best understood 
through the connection between gradient flows on one hand and large deviations on the other. We 
have recently shown this connection for a number of systems [ADPZ11, PR11, DLZ11, DLR12], 
including (3). 

The philosophy can be formulated in a number of ways, and here we choose a perspective 
based on the behaviour of a single particle. We start with the simpler case of equation (3) and 
the discrete approximation (4). Let {X e } e>0 be a rescaled d-dimcnsional Wiener process, 



dX e (t) = V2aedW(t), 



(7) 



where a is a mobility coefficient. If we fix h > 0, then by Schilder's theorem (e.g. [DZ87, Th. 5.2.3] 
the process {X^t) : t 6 [0, h]} satisfies a large-deviation principle 



Prob(X £ (-) «£(•)) ~exp 



as e — » 0, 
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where the rate functional I : C([0, h]; R d ) — > R U {+00} is given by 



J ® = To- 



rn 



dt. 



The Wasserstein cost function 



y 



2 can be written in terms of I as 



= Aha inf {!(£) : £ € C 1 ([0, h], R d ) such that £(0) = as, = y) 



(8) 



Hence the cost \x — y\ 2 can be interpreted as the the probability that a Brownian particle goes 
from x to y in time h, in the sense of large deviations, and rcscalcd as to be independent of the 
magnitude of the noise a. 

The results of [ADPZ11, PR11, DLR12] concern a similar large-deviation analysis, but now 
for the empirical measure of a large number n of particles. For this system the limit n — > 00 plays 
a role similar to e — > in the example above. In [ADPZ11, PR11, DLR12], it is shown that this 
rate functional is very similar to the right-hand side of (4) in the limit h — > 0. This result explains 
the strong connection between large deviations on one hand and the gradient-flow structure on 
the other. 

However, the core of the argument of [ADPZ11, PR11, DLR12] is contained in the Schildcr 
example (7) and its connection (8) to the Wasserstein cost. Hence we use this simpler point of 
view to generalize the approximation scheme (4) to the Kramers equation. There are two different 
ways of doing this. 

Approach 1 [HuaOO]. Instead of the inertia-less Brownian particle given by (7), we consider a 
particle with inertia satisfying 



dQS) 



dt, 



(9a) 
(9b) 



dP e (t) = v/267/3- 1 dW(t), 

which can formally also be written as 

d 2 , . t- - , dW , . 

m-^Q e (t) = ^2jl3-ie—(t). 

By the Frcidlin-Wcntzell theorem (e.g. [DZ87, Th. 5.6.3]), the process Q e (t) satisfies a similar 
large-deviation principle with rate functional /: C([0, h], R rf ) ->RU {+00} given by 



m£{t) 



dt. 



The comparison with (8) suggests to define a cost functional Ch{q,P]q' ,p') in a similar way, i.e. 

r h .. 2 

/ mf(t) dt : ^ e C x ([0,fe],R d ) such that 
Jo 

(C,mO(0) = (q,p), ($,m£)(h) = (q',p') 



C h {q,p;q',p') ■= h inf 



\p'-p\ 



12 



P +P 



(10) 



The second formula follows from an explicit calculation of the minimizer. As above, the inter- 
pretation is that of the probabilistic 'cost', that is, the large-deviations characterization of the 
probability of a path of (9) connecting (q,p) to (q',p') over time h. Note that C/j is not a metric, 
since it is not symmetric, and also Ch{q,P] q,p) = 12|p| 2 generally does not vanish. Therefore the 
Wasserstein 'distance' Wh defined with Ch as cost is not a metric, but only an optimal-transport 
cost (see [Vil03] for an exposition on the theory of optimal transportation). 
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Huang then defines the approximation scheme as 



Scheme 1 [HuaOO]. Given a previous state pu-x, define pk as the solution of the minimization 
problem 

min — -W h (pk-l,p)+A{p) + — / p(q,p)V(q)dqdp, (11) 

where W \ is the optimal-transport cost on R 2d with cost function CV 

Huang proves [HuaOO, Huall] that the approximations generated by this scheme indeed con- 
verge to the solution of (1) as h — > 0. 

1.5 Criticism 

Although Scheme 1 is approximately of similar form to (4), there are in fact important issues 
with this scheme: 

1. In (1), the dissipative effects are represented by the terms prefixed by 7, and the conservative 
effects by the the Hamiltonian terms div g pp/m and div p pW . It would be natural to see 
these effects play separate roles in the variational formulation. However, in Scheme 1 the 
effects arc mixed, since the final term in (11) mixes conservative effects (represented by V 
and m) with dissipative effects (the prefactor 7, and the role as driving force in a gradient- 
flow- type minimization). 

2. The dependence on h of the final term in (11) adds to the confusion; since this parameter is 
an approximation parameter chosen independently from the actual system, the combination 
A + 2m /jh J pV can not be considered a single driving potential. 

3. In fact, in the standard case F(p) = p 2 /2m the sum of A and J pV is a natural object, since 
it represents total free energy and decreases along solutions (see Section 1.3). Note how the 
coefficient in this sum is 1 instead of 2m/ jh. 

The way in which V appears in Scheme 1 can be traced back to the fact that of the two conservative 
terms in (1) and (2), only P/m is represented in the definition of the cost Ch, in the right-hand 
side of (9a); the term W is missing in (9). Therefore the scheme has to compensate for the other 
term VV^ in a different manner. 

These arguments lead us to pose the following question, which is the central topic of this 
paper: 

Can we construct an approximation scheme that respects the conservative- 
dissipative split? 

The answer is 'y es \ and in the rest of this paper we explain how; in fact we detail three different 
schemes, corresponding to different ways of answering this question. 



1.6 The schemes of this paper 

We take a different approach than Huang did. 

Approach 2. To set up a new cost functional, we first return to the single-particle point of view, 
as in (7) and (9). We now take a particle whose behaviour is a combination of the two Hamiltonian 
terms in (2) and a noise term: 

P (t) 

dQJt) = dt, (12a) 
m 

dP e (t) = -VV{Qz)dt + y/2j/3- 1 edW{t), (12b) 



G 



which again can formally be written as 



■dW 



Note how this system differs from (9) by the term involving W in (12b). 

A very similar application of the Freidlin-Wentzell theorem states that Q e satisfies a large- 
deviation principle as e — > with rate function 

1 / I iV.s „,„ > , a m2 



m = 4^ J K(*) + vy (^))l dt 



This leads to the following scheme. 



Scheme 2a. Wc define the cost to be 



C h (q,p; q',p') := hint j J \m£(t) + W(£(i))| 2 dt : £ e C^flO, h], R d ) such that 

(£,mO(0) = (g,p), (£,m£) (ft) = (</,?')}■ (13) 

Given a previous state pk-i, define as the solution of the minimization problem 

min — -W h { P k-i,p)+A(p), (14) 
p z/17 

where Wh is the optimal-transport cost on R 2d with cost function Ch- 



Note how now the term involving V has disappeared from the minimization problem (14). In 
Sections 4-6 we show that this approximation scheme converges to the solution of (1) as ft 0. 

For practical purposes it is inconvenient that the cost Ch in (13) has no explicit expression. 
It turns out that we may approximate Ch with an explicit expression and obtain the same limiting 
behaviour. 



Scheme 2b. Define 

C h (q,p;q',p') := ftinf <j / m|(i) + W(g) " : (£,m£)(0) - (g,p), (£,ro£)(/0 = (q',p') 



n 



b'-p| 2 + i2 



m , p' + p 2 

-r(9 -«)- 



h x ' 2 
|p'-p + ftW(g)| 2 + 12 



TO p' + p 2 

(<? -9) 



+ 2ft(p' - p) • VV(?) + h 2 \VV(q)\ 2 

(15) 



ft ^ 2 

Given a previous state Pk-i, define pk as the solution of the minimization problem 

min -W h (p k -i,p) + A{p), (16) 
p 2/17 

where 14^ is the optimal-transport cost on R 2d with cost function Ch- 



Note how Ch differs from (13) in that is replaced by q in VV. This approximation is 
exact when V is linear. We prove the convergence of solutions of Scheme 2b in Sections 4-6. 
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Neither of the costs Ch and Ch gives rise to a metric, since they are asymmetric and do not 
vanish when (<7',p') = (q,p)- It is possible to construct a two-step scheme with a symmetric cost 
and corresponding metric Wh- 



Scheme 2c. Define 

2 



C h (q,p;q',p') := \p'- P \ 2 + 12 



m p -p 



+ 2m{q'-q)-(VV(q')-VV(q)). (17) 



h v 2 

Assume p\_ x is given, define the single-step, backwards approximate streaming operator 

<Th(Q,P) : = (4-h£,p + hVV(qj). (18) 

Given a previous state Pk-i, define pk in two steps. 
Hamiltonian step: First determine fj,%(q,p) such that 

Mfcfop) : = cr fc 1 (e^)«Pfc-i(«'P)' ( 19 ) 

where (J denotes the push forward operator. 

Gradient flow step: Then determine p\ that minimizes 

min Ll Wh { l 4,p)+A{p), (20) 
p Zh 7 

where is the metric on R 2d generated by the cost function Ch- 



1.7 The main result and the relation to GENERIC 

The main theorem of this paper, Theorem 2.3 below, states that the three new Schemes 2a-c 
arc indeed approximation schemes for the Kramers equation (1): the discrete-time approximate 
solutions constructed using each of these three schemes converge, as h — > 0, to the unique solution 
of (1). 

This statement itself is a relatively uninteresting assertion: it states that the schemes are 
what we claim them to be, approximation schemes. The interest of this paper lies in the fact that 
these three schemes suggest a way towards a generalization of the theory of metric-space gradient 
flows, as developed in [AGS08], to equations like (1) that combine dissipative with conservative 
effects. 

Indeed, the full class of equations and systems that combines dissipative and conservative 
effects is extremely large. It contains the Navier-Stokes-Fourier equations (which include heat gen- 
eration and transport), systems modelling visco-clasto-plastic materials, rclativistic hydrodynam- 
ics, many Boltzmann-type equations, and many other equations describing continuum-mechanical 
systems. In fact, the full class of systems covered by the GENERIC formalism [Ott05] is of this 
conservative-dissipative type, and indeed the Kramers equation is one of them. 

The GENERIC class (General Equation for the Non-Equilibrium Reversible Irreversible Cou- 
pling) consists of equations for an unknown a; in a state space X that can be written as 

±(t) = J{x)E'{x) + K{x)S'{x). 

Here E, S : X — >• R arc functionals, and J, K are operators. A GENERIC system is fully char- 
acterized by X, E, 5, J, and K. In addition, there are certain requirements on these elements, 
which include the symmetry conditions 



J is antisymmetric and K is symmetric and nonnegative, 
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and the degeneracy or non-interaction conditions 

J{x)S'(x) = 0, K{x)E'{x) = 0, for all x G X. 

Because of these properties, along a solution E is constant and S increases. In many systems the 
functionals E and S correspond to energy and entropy. 

When F(p) = \p\ 2 /2m, the Kramers equation (1) can be cast in this form. 1 Because of this, 
the results of this paper strongly suggest that similar schemes can be constructed for arbitrary 
GENERIC systems. We leave this for future study. 

1.8 Conclusion and further discussion 

We now make some further comments about the schemes of this paper. 

Value of the three schemes. Scheme 2a is in our opinion interesting because 'it is the right 
thing to do' it stays as close as possible to the underlying physics. However, its non-explicit 
nature makes it difficult to work with, as the calculations in the proof of Lemma 3.1 illustrate. 
Scheme 2b is therefore useful as an approximation of Scheme 2a. Scheme 2c has the advantage of 
being formulated in terms of a metric Wh , which suggests applicability of metric-space theory. 

The linear-friction case F(p) = \p\ 2 /2m. The coefficient 7/cT in (1) and the coefficient a := 
y/2^fkT in (2b) are obviously related by a 2 = 2"/kT. When F(p) = \p\ 2 /2m, the coefficient 7 is 
also the coefficient of linear friction, and this relationship between a, 7, and temperature is the one 
given by the fluctuation-dissipation theorem. This guarantees that the Boltzmann distribution 

Poo (q,p) = Z- 1 cxp (-±.H(q,p^j , (21) 

is the unique stationary solution of (1). Moreover, the total free energy £ is the relative entropy 
with respect to p^, and it is a Lyapunov functional for the system, as is shown in (6). 

When F docs not have this specific form, but does have appropriate growth at infinity, then 
there still exists a unique stationary solution p^, which however does not have the convenient 
characterization (21). The relative entropy with respect to p^ is then again a Lyapunov fucntional. 

Connection to ultra-parabolic equations. IfV is linear, V(q) — c-q, where c G R d is a constant 
vector, then Ch coincides with Ch. In this case, Ch = Ch is closely related to the fundamental 
solution of the equation 

p o~ 

dtp{t,q,p) = — • V q p{t,q,p) + c-V p p(t,q,p) + —A p p(t,q,p). (22) 

Indeed, the fundamental solution T t (q,p; q',p') of (22) is given by 

F t {q,p;q',p') = ^exp \^-—C t {q,p; q' ,p) J , (23) 

where ot\ is a normalization constant depending only on d. This fact is true for a much more general 
linear system and is related to the controllable property of the system [DM10]. The appearance 
of the rate functional from the Freidlin-Wentzell theory in (23) consolidates the connection to the 
large deviation principle of our aprroach. 

Connection to the isentropic Euler equations. The cost function Ch has been used in [GW09, 
WeslO] to study the system of isentropic Euler equations, 

dtp + V • {pa) = 0, 

dtu + u- Vu= -VU'{p), 



1 In order to do this, the variable p needs to be supplemented with an additional energy variable, that compensates 
for the gain and loss in the energy H as a result of the dissipative effects. 
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where U: [0,oo) — > R is an internal energy density. We now formally show the relationship 
between two equations. Suppose that p(t,q,p) is a solution of the Kramers equation (1) with 
f (p) = \p\ / 2m - We define the macroscopic spacial density and the bulk velocity as 

p(t,q) = f p(t,q t p)dp, (24) 

u(t,q) = — / ^p{t,q,p)dp, (25) 

Using the so-called moment method, we find that (p, u) satisfies the following damped Euler 
equations [CSR96, Cha03, CLL04], 

d t p + V-(pu) = Q (26) 

<9 t u + u-Vu = -^- X W-^-u. (27) 

m p m m 

If 7 = and V = 0, these are the isentropic Euler equations with internal energy U(p) = /3 p log p. 
In [GW09, WcslO], the authors showed that the isentropic Euler equations may be interpreted as a 
second-order differential equation in the space of probability measures. They introduced a discrete 
approximation scheme, which is similar to Schemes 2a-b, using the cost functional Ch- One 
future topic of research is to analyse whether one can approximate other second-order differential 
equations in the space of probability measures (e.g., the Schrodingcr equation [vRll]), using the 
cost function Ch- 

Connection to Ambrosio-Gangbo [AG08]. The Hamiltonian step in Scheme 2c is a general- 
ization of the implicit Euler method for a finite-dimensional Hamiltonian system to an infinite- 
dimensional case. It is also compatible with the concept of Hamiltonian flows in the Wasserstein 
space of probability measures defined by Ambrosio and Gangbo in [AG08]. Let H: V 2 {"R? d ) -t 
(— oo, +oo] and ~p <G ^(R- 2 ^) be given. Then p t : [0,oo) — > ^(R 2 ^) is called a Hamiltonian flow 
of % with the initial measure ~p if the following equation holds 



d 
It 



—fit = div qp (p t JVH{p t )), Mo = M> t G (0,T), 



where J is a skew-symmetric matrix and VH(/Ltj) is the gradient of the Hamiltonian % at pt 
(Definition 3.2 in [AG08]). In particular, when H(p) = J R2d \^nl ^(^)) P( c liP)dq.dp then VH = 



(V 9 V(g), ^) T . According to Lemma 6.2 in [AG08] when p is regular, a Hamiltonian flow in 
a small interval (0, h) is constructed by pushing forward the initial measure 71 under the map 
$(£, •) = (q(t),p(t)) which is the solution of the system (2) (with 7 = 0). In the Hamiltonian step 
we approximate this system by the implicit Euler method and define p\ to be the end point p(h). 



1.9 Overview of the paper 

The paper is organized as follows. In Section 2, we describe our assumptions and state the 
main result. Section 3 establishes some properties of the three cost functions. The proof of the 
main theorem is given in Sections 4 to 6. In Section 4, we establish the Euler-Lagrange equations 
for the minimizers in three schemes. In Section 5, we prove the boundedness of the second moments 
and the entropy functional. Finally, the convergence result is given in Section 6. 



10 



2 Assumptions and main result 

Throughout the paper we make the following assumptions. 

V G C 3 (R d ) and F G C 2 (R d ), > for all x G R d . (28) 

There exists a constant C > such that for all zi,z% G R 

i |zi - z 2 | 2 < (zx - z 2 ) • (W(*i) - W(z 2 )), (29a) 

|W(*i) - W(z 2 )| <C| Zl -z 2 |, (29b) 
|VF(zi)-VF(z 2 )|<C|*i-*2|, (29c) 
\W 2 V(zx)\ , |V 3 F(zi)| < C. (29d) 

Note that (29a) implies that V increases quadratically at infinity, and therefore V achieves its 
minimum. Without loss of generality we assume that this minimum is at the origin, which implies 
the estimate 

\VV(z)\ < C\z\. (30) 

As we remarked in the Introduction, we work in the dimensional setting, and keep all the 
physical constants in place, in order to make the physical background of the expressions clear. We 
make an important exception, however, for inequalities of the type above; here the constants C 
can have any dimension, and we will group terms on the right-hand side of such estimates without 
taking their dimensions into account. This can be done without loss of generality, since we do not 
specify the generic constant C, and this constant will be allowed to vary from one expression to 
the next. 

We only consider probability measures on R 2d which have a Lebesgue density, and we often 
tacitly identify a probability measure with its density. We denote by "P 2 (R 2d ) the set of all 
probability measures on R d x R d with finite second moment, 

7> 2 (R 2d ) := (p: R d x R d -> [0, oo) measurable, J p(q,p)dqdp = 1, M 2 (p) < ooj , 

where 

M 2 (p)= f ( 1 2 \q\ 2 + \p\ 2 )p(q,p)dqdp. (31) 

With these assumptions, the functionals A and £ introduced in the introduction are well- 
defined in ^(R 2 ^). Moreover, the following two lemmas are now classical (see, e.g., [Vil03, 
Theorem 1.3], [JK098, Proposition 4.1], and [HuaOO, Lemma 4.2]). Let be one of C h , C h , or 
Ch, defined in (13), (15), and (17), with corresponding optimal-transport cost functional W£. 

Lemma 2.1. Let po,p £ 'P 2 (R 2d ) be given. There exists a unique optimal plan P* pt G T(po,p) 
such that 

W£(po,p)= f C* h (q,p;q', P ')P* vt {dqdpdq'dp'). (32) 
Lemma 2.2. Let p Q G P 2 (R 2d ) be given. If h is small enough, then the minimization problem 

^ d ^b ¥ h^P)+ A ^ ( 33 ) 
P ev 2 (R 2d ) 2n 7 

has a unique solution. 



These lemmas imply that Schemes 2a-c are well-defined. 



11 



Next, we make the definition of a weak solution precise. A function p £ L 1 (R+ x R 2d ) is 
called a weak solution of equation (1) with initial datum po £ 7^2 (R 2d ) if it satisfies the following 
weak formulation of (1): 



R 2d 



p dqdpdt 



= - [ ip{0, q 7 p)p a {q,p) dqd Pl for all <p £ C C °°(R x R 2d ). (34) 
The main result of the paper is the following. 

Theorem 2.3. Let p G V 2 (R 2d ) satisfy A(po) < oo. For any h > sufficiently small, let p\ 
be the sequence of the solutions of any of the three Schemes 2a~c. For any t > 0, define the 
piecewise- constant time interpolation 

p h (t, q,p) = p h k {q,p) for (k - l)h < t < kh. (35) 

Then for any T > 0. 

p h -v p weakly in L l {{Q,T) x R 2d ) as h -> 0, (36) 
where p is the unique weak solution of the Kramers equation with initial value po . Moreover 

p h {t)->p(t) weakly in L 1 (R 2d ) as h -> for any t > 0, (37) 

and as t — > 0, 

p(t) -> p Q in L 1 (R 2d ). (38) 

Outline of the proof. The proof follows the procedure of [JK098] (see also [HuaOO, Huall]) and is 
divided into three main steps, which are carried out in Sections 4, 5, and 6: establish the Euler- 
Lagrange equation for the minimizers, then estimate the second moments and entropy functionals, 
and finally pass to the limit h — > 0. We start in Section 3 with some properties of the cost 
functions. □ 



3 Properties of the three cost functions 

Here we derive and summarize a number of properties of the three cost functions. Define the 
quadratic form 

N(q, P ) := |7<Z| 2 + H 2 , 
so that M 2 (p) = J R2d N(q,p) p(q,p) dqdp. 

Lemma 3.1. 1. Let be either or CV There exists C > such that 

\q-q'\ 2 + \p-p'\ 2 <CC h (q,p-q',p'), (39a) 
\q -qf< Ch 2 [C* h (q,P; q',p) + N(q,p) + N{q',p'j\ , (39b) 
\p-p'\ 2 <C[CUq,P;q',p') + h 2 N(q,p) + h 2 N(q',p')]. (39c) 

2. For the cost function Ch of Scheme 2a we have 

V q ,C h {q,p; q',p') = ^ (j-{q' - q) - - 2hW 2 V(q') ■ p' + a h (q,p; q',p'), (40a) 

V p >C h (q,p; q',p') = 2(p' - p) - 12 (j{q' - q) - + 2hVV(q') + T h (q,p; q',p'), 

(40b) 
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where there exists C > such that 



Wh 



(q,p;q',p)\^\r h (q, P ;q\ P ')\ < Ch{c h (q,p; q' ,p') + N(q,p) + N(q',p') + l}. (41) 



3. For the cost function Ch of Scheme 2b we have 

„ p; , , 24to /to , p' + p\ , . 

V q >C h {q,p;q',p') = — ( -{q 1 - q )-tL_JL\ , ( 42a ) 

V p ,C h (q,p; q' t p') = 2(p' -p) - 12 feg' - g) - + 2hVV(q). (42b) 



^. For the cost function Ch of Scheme 2c we have 

V q ,C h (q,p; q',p') = ^ (j-{q' - q) - t^E>j + 4m(W(g') - W(g)) + r(g,g'), (43a) 
Vj,C h ( q ,p;q',p') = 2(p'-p) - 12 (jtf - q) - ^) , (43b) 
where 

\r(q, q')\ < Ch 2 [C h (q,p; q',p') + N(q,p) + N(q',p')] . (44) 
Proof. For the length of this proof we fix q 1 \p' , and h, and we abbreviate 

C h :=C h (q,p; q ',p'), C h := C h (q,p;q' ,p'), and N N(q :P )+N(q' 7 p r ) = \ iq \ 2 +\p\ 2 +\ iq '\ 2 +\p'\ 2 . 

Let £(t) and £(i), respectively, be the optimal curves in the definition of Ch in (10) and of Ch 
in (15). We will need a number of properties of these two curves. All the statements below are 
of the following type: there exists C > and < ho < 1 such that the property holds for all 
h < ho- Here C is always independent of q,p,q',p', and h. The norm || ■ |j p is the I/ p -norm on the 
interval (0, h). 

The curve £ satisfies £ = 0, and hence it is a cubic polynomial 

l{t) = q +at + bt 2 +ct 3 , (45) 

where the coefficients can be calculated from the boundary conditions: 

p 3 / , ph\ p' -p p' +P 2 
a =— » 6= T2 U -, Z £T> c= ~ 73 W ~ 

TO /l z \ TO / TO/l TO/l z Al' 3 

Explicit calculations give 

lleill < hU\\l < ChN, (46) 

lltlll < ^llfllL < C{h- 3 \q qf + h- l \p -p'\ 2 }, (47) 

llt|| l <^||f||oo<c{ft- 1 |g-g'| + |p-p'|}. (48) 

The curve £(t) satisfies the equation 

AT(0(i) := TO 2 f(t) + 2mV 2 F(0 ■ |(t) + mV 3 F(£) ■ £ ■ 1(0 + W(0 ■ W(0 (f) = 0, (49) 

(£m|)(0) = (£m£)(/i) = (g',p'), 

where V 3 V is the third-order tensor of third derivatives of V. This is a relatively benign equation, 
but non-trivially nonlinear. 
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We will need the following four intermediate estimates: 

U\\l< ChN, 
C h + hU\\ 2 2 <C{C h + h 2 N}, 

\\t\\%<Ch{C h + N}, 
\\u\\i<C{C h + N + l}. 

We first prove (50). Since £ is optimal in Ch, 

m\\th < l|mf+VF(0||2 + ||VnOI|2 
( < \\mt+VV(0\\ 2 + \\VV(0\\ 2 

< m||f|| 2 + ||vv(0|| 2 + ||vy(0|| 2 

(30) ~ _ 

< m|ieii2+c(iiei| 3 + iifi| a ) 

< "*nf ii 2 + c(n?ii a + ^iieiioo). 

Therefore 

M\\oo<\m\ + h\t(0)\ + h 3/2 \\Zh 

< \ q \ + ^\ P \ + ch 3 / 2 {nf i| a + \\t\\ a + ^ 1/2 neiioo}. 

If ho is small enough, then Ch 2 < 1/2, so that 



lien 



(46), (47) 



'<" 2\q\ + —\p\ +C{\q- q'\ + h\p-p'\ + h 2 VN\. 

ml J 



Therefore 



(50) 
(51) 
(52) 
(53) 



(54) 



U\\l<hU\\l,<ChN, 

which is (50). 

Similar to (54) it also follows, since £ is admissible for Ch, that 

C h = m 2 h\\tf 2 < m 2 /i||ej|l < 2h\\mt+ W(£)||| + 2h\\\7V(0\\ 2 

(13), (30) „ (50) _ 

< 2Ch + ChU\\ 2 2 < 2C h + Ch 2 N, 

which implies (51). 

We now can prove part 1 of the Lemma. (39a) is a direct consequence of (17) and (29a). The 
estimate for p follows from (15) and (30) for Ch, and from (10) and (51) for C\: 



\p' -pf <C[\p'-p + hVV(q)\ 2 + h 2 \VV(q)f 
\p' -p\ 2 <Ch<C{C h + h 2 N). 



< c 



C h (q,p;q',p') + h 2 N 



Similarly, 



< 



m 2 

2 



m p + p' p'+p 

jil -9)- — + — 



m p + p 1 

-h {q - q) - — 



4 



\p'\ 



'12 



<Ch 2 (Ch + N)<Ch 2 {C h + N), 



(55) 
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and also 

W-q\ 2 <Ch 2 (C h +N). 
Using the Poincare inequality \\v — j- v\\2 < C/i||i/||2, the estimate (52) then follows by 

X. ~ X. (51) 9 ~ (39b) 

U\\i<nHWl + Ch 2 U\\ 2 2 < -\q-q'\ 2 + Ch{C h + h 2 N} < Ch{C h + N}. 

To prove the final of the four intermediate estimates, (53), we define u = £ — £; remark that 

m 2 u = -2mV 2 V{l) ■ £- mV 3 V(J) ■ £ • £ - V 2 F(£) ■ W(£). (56) 

Note that it = ii = at t = 0,h, so that we have ||it||i < Ch 4 \\u\\i and ||ii||i < Ch 2 \\ u\\i- We 
then calculate 

(56), (29) r x. ~ „ ~ 1 

Flli < ^{llCI|i + lial + lieili + /i} 

< c{||t||i + !ieil^ + ll?lli + ll"lli + IHIi} 

< cjllflli + llllll + llCllx + ^llulli + hiulli}. 
Again, taking ho sufficiently small, we have C(h 2 + h 4 ) < 1/2, and therefore 

Fill < c{ll!||i + ||&2 + ll!lli} 

(46), (48), (52) f\q-q'\ 



< C y q h q 1 + \p - p'\ + hC h + hN + hVNj 



(39b) 



< C\\JC h + N + hC h + N + l 
c{c h + N + l}. 



We now continue with parts 2, 3, and 4. The derivatives of Ch can be calculated directly 
using the explicit expression (15). The derivatives of Ch can be calculated as follows. Let r\ € 
C 2 ([0,h};R 2d ) satisfy r){0) = 0. Then 



h 

2t 



lim 4 7 /T ^ /(£ + e»7) = 2/i / (m£ + W(£)) ■ (""7 + V 2 F (£) • 77) (t) dt 



= 2h / Af(£) ■ rj(t) dt + 2h mr](m£ + W(0) - mr)K + V 2 U(£) • £) (h). 
Jo L J 

Note that A/"(£) = by the stationarity (49) of £. This expression is equal to 

V q >C h (q,p; q',p') ■ i](h) + V p >C h (q,p; q',p') ■ mfj(h), 

which allows us to identify the two derivatives in terms of £. Setting u — £ — £, we rewrite these 
in terms of u: 

V q 'C h (q,p;q',p') = -2hm 2 'f(h) - 2hmV 2 V(l(h)) ■ £(h) 

= -2hm 2 l(h) - 2hmX/ 2 V(l(h)) ■ 1(h) ~ 2hm 2 (l(h) - 1(h)) 
(45) 24m fm _ q) _ p^+p\ 2WW) , p , _ 2hm 2 m , 



h v ^ 2 

V P >C h (q,p;q',p') = 2hml(h) + 2hVV(l(h)) 

= 2hmt(h) + 2hVV(£(h)) + 2hm(l(h) - 1(h)) 
{i = 2(p' -p) - 12 f^-V - g) - ^y^) + 2/iW(g') + 2hmu(h). 
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Therefore (40) holds with 

<Jh = —2hm 2 u(h) and t% = 2hmu(h). 
The estimates (41) then follow from (53) and the inequalities 

||ii||oo < ft.||u||oo < Ch\\u\\i, 

which hold since u = ii = at t = 0, h. 

The derivatives of Ch are given by (43), where 

r(q,q') := 2m\v 2 V(q') ■ (q' - q) - VV{q') + VV(q) . 
The estimate (44) on r follows from (29d), (55), and the fact that by (29a), Ch < Ch- □ 

4 The Euler-Lagrange equation for the minimization prob- 
lem 

Let C^ be one of Ch, Ch, or Ch, defined in (13), (15), and (17), with corresponding optimal- 
transport cost functional W£. Let p £ V2 (R 2d ) be given and let p be the unique solution of the 
minimization problem 

We now establish the Euler-Lagrange equation for p. Following the now well-established 
route (see e.g. [JK098, HuaOO]), we first define a perturbation of p by a push-forward under an 
appropriate flow. Let £,77 e Cg°(R 2d , R d ). We define the flows [0,oo) x R 2d -> R d such 

that 

2£ = *(*.,*.), ^ = **.,*.), 

os os 

*o(g»p) = q, ®o(q,p) = p- 

Let p s (q,p) be the push forward of p(q,p) under the flow (\l/ s ,$ s ), i.e., for any ip £ Cg°(R 2d ,R) 
we have 



v(q,p)Ps{q,p)dqdp= / ip(ty s (q,p),$> s (q,p))p(q,p)dqdp. (57) 
Obviously po(q,p) = p{q,p), and an explicit calculation gives 

d s p s \ s _ = —div q p<f) — divpprj in the sense of distributions. (58) 
By following the calculations in e.g. [HuaOO] we then compute the stationarity condition on p, 

= 7T~T [ [^^C^P^'^^-Hq'^^+Vp'C^p^'^^-rjiq'^'^P^idqdpdq'dp') 

+ / p{q,p)'VpF(p)-'q(q,p)dqdp- / p(q,p) [div q (j}(q,p) + div p r](q,p)} dqdp, (59) 

where P* t is optimal in W^(7j, p). For any ^ g Qf(R 2d , R), we choose 

<K<?V) = -tStV^V) + I^V^V), 
ottt/ 2m 
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7^" J 7" T \ 

W: 2m T Vv(q',p')- 



i 7/ 



(60) 



Now the specific form of the cost functional C^(q,p;q',p') comes into play. We calculate the 
gradient expression in (59) for each scheme in the next subsections. 

Remark 4.1. The structure of the choice (60) can be understood in terms of the conservative- 
dissipative nature of the Kramers equation. The matrix in front of V(p(q',p') in (60) is of the 
form 



7/1" j *yh j> 
6m 2 2m 

2m J 







brn A 

ji 



2/h_ ( i 

2m \-I 



Note that A is symmetric and B is antisymmetric: this mirrors the conservative-dissipative struc- 
ture of the Kramers equation. 

The top-left block in A, which would correspond to diffusion in the spatial variable q, is of 
order 0(h 2 ), and therefore vanishes when h — > 0. The other block, which corresponds to diffusion 
in the momentum variable p, is of order 0(1) and remains. This explains how in the limit h — > 
only diffusion in the momentum variable remains. 



4.1 Schemes 2a and 2b 

Lemma 4.2. Let h > and let {p k } be the sequence of the minimizers either for problem (14) in 
Scheme 2a or for problem (16) in Scheme 2b. Let W£ be Wh for Scheme 2a and Wh for Scheme 2b, 
and let Pjl* be optimal in Wl{p\_ x ,p\). Then, for all ip £ C^(R 2d ), there holds 







h ./ R 4d 



\{q - q) ■ V q '(p(q',p') + (p -p) ■ V p ,Lp(q',p')) P%* (dqdpdq dp ) 



p'-V q ,cp(q',p')p%(q',p')dq , dp' 



R- 



W(g') •Vp/pte' \p')p h k {q' \p')dq'dp' 



R- 



+ 7 / [VF(p') • Vp^fcV) - p-^tptf,!/)] p h k (q',p'Wdp' + 



(61) 



where 



\<4\ < Ch\w* h {p h k _ liP h k ) + M 2 (pti) + M 2 (p h k ) + 1 



The second moment A/2 is defined in (31). 



Proof. For Scheme 2b we combine (60) with (42) to yield 

V g , C h (q,p;q',p')- 4>{q' , p') + V p > C h (q,p;q',p') ■ ??(</ , p') 



2 7 



(q' - q) ■ V q ^{q',p') + (p' - p) ■ Vp>ip(q',p') - -p' ■ V q ^(q',p' 

m 



2 1 VV{q) 



h 2 

— V q ^{q',p') + hV p ,<p(q',p') 
2m 



(62) 
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Substituting (60) and (62) into the Euler-Lagrange equation (59), we obtain 
\(q - q) ■ V q >ip(q',p') + {p 1 - p) ■ V p ,ip(q' ,p')} P k (dqdpdq dp) 



n j R 4d 
1 



p ■ V q *<p(q',p')p k (q',p')dq'dp + / VV(q) ■ V p >(p(q' ,p')P k (dqdpdq'dp) 
m j R 2d j R 4d 



VF(p') ■ S/ p ^(q', P r ) + p- 1 -^A q , ( p(q', P ') - /T'A^V) 
~ T~ I [^V(q)+ 1 VF(p')] ■ V Ml' >P')Pk (dqdpdq'dp'). 
Therefore (61) holds with 

(W(ff) - W(g')) • V p ,<p(q',p')Pj!(dqdpdq'dp')dq'dp' 



p h k {q',p')dq'dp' 

(63) 



R 4t! 



-0 



-I * 



6m 2 



A q ^(q',p')p h k (q',p')dq'dp' 



R 2d 



(29b). (29c) 
< 



< 



(39) 
< 



c 



c 



Ch 



f [VV(q) + 7 VF(p')] • V q *v(q':P')P£ (dqdpdq'dp') 
q ~ q'\ + MM + \P\ + !)] Pk (dqdpdq'dp') 



R 4d 



-\q - q'\ 2 + h(\q\ 2 + \pf + 1) P^ (dqdpdq 1 dp') 



W h (p h k _i,p h k) + M 2 (p h k _i) + M 2 (p h k ) + 1 



This proves Lemma 4.2 for Scheme 2b. 

For Scheme 2a we obtain an identity similar to (62), 

V q ,C h (q,p; q',p') ■ (p{q',p r ) + V P >C h (q,p;q',p') ■ v(q',p') 
= 2 1 

h 



(q' - q) ■ V q ><p(q',p') + (jp'-p) ■ V P >tp(q',p') - -p' ■ V q ><p(q',p') 

m 



+ 2~f{hVV(q') + -T h (q,p;q',p')} 
+ 2 1 [-hV 2 V(q')-p' + ^a h (q,p';q',p')} 



V q ><p(q',p') + V p ^(q',p') 



h 2 h 
- — V qli p[q',p') + —V p , V {q',p') 



This leads to the same equation as (61), but now with error term 



2m 



R 4d 



WV(q') ■ V q ,<p(q',p')P£(dqdpdq J dp') 



+ 



R« 
1 



jvV(q') -p - ^r(Th(q,p\q',p')} ■ 



P£ (dqdpdq'dp') 



2h 



R 4d 



Th(q,p; q',p' 



2m 



V q ><p(q',p') + V P >tp(q',p') 



P[! (dqdpdq'dp') 



^ / VF(p') ■ VMq',p')p h k (q',p')dq'dp' 
2m Jjiid 

i h 2 



6m 2 



R 2d 



A q ,<p(q',p')p h k (q',p')dq'dp'. 
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We estimate this error as follows, using the notation of the proof of Lemma 1: 



C 



h(l + \q'\) + h\p'\ + \a h \ + -\r h \ + h(l + \p'\) + h 2 \pj} 



< C J)h(l + \ q f + \pf) + h [C h + N + l] \P£ 

< Chi [C h + N + l]P[ l 

< ch\w h (f%-i,f>i) + M a (pti) + m M) + 1 



This concludes the proof of Lemma 4.2. 



□ 



4.2 Scheme 2c 

Lemma 4.3. Let h > and let {p k } and {p' k 1 } be the sequences constructed in Scheme 2c. Let 
P k {dqdpdq' dp') be the optimal plan in the definition of Wh{p k , p k ). Then, for all cp € C^°(R 2d ), 
there holds 

V -q+—h) ■ V q «p(q',p') + ( P '-P- hV q V(q)) ■ \7 p ^( q ',p')} P^dqdpdq' dp 1 ) 
/ V V q <p(q,p)pt(dqdp) + / VV(g) • W p (f(q 1 p)p l jt(q,p)dqdp 



+ 7 



where 



[ [VF(p) ■ VMl'P) ~ p-^pVfap)] pl{q,p)dqdp + Ct 



K£| < Ch[hW h (plp h k ) + M 2 (4) + M 2 (p h k ) + 1]. 

Proof. From (60) and (43) we obtain 

V q . C h (q,p;q',p')-<t>(q',p') + Vp/ C h (q,p;q' ,p') ■ r)(q' , p') 



27 



(q' - q) ■ Vtf<ptf,tf) + (P' - P) ■ VMJrf) ~ -(P'-P) ■ V 9 ^(9>') 

m 



+ 7 



4m( W(g') - VV(q)) + r(q, q') 



h 2 h 
■ —V q ,<p(q',p') + —V P «p(q',p') 
bra* Am 



Substituting (60) and (65) into the Eulcr-Lagrangc equation (59), we obtain 



(64) 



(65) 



= 7 / [(<?' - q) ■ V q ,<p(q',p') + {p' - p) ■ W pl( p(q',p')} P> l (dqdpdq'dp') 



- [ (P -p)-V q ^{q l ,p')P } k l (dqdpdq'dp') + f (VV(q') - VV(q)) ■ V p ,<p(q' ,p')P£(dqdpdq'dp') 



+ 7 / [VF(p) -Vp^g.p) - P^Aptpfap)] p h k (q,p)dqdp + 



(66) 
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where we estimate the remainder, again using the notation of the proof of Lemma 3.1, 



h 

3m 



(WV(q') - W(q)) • W q ^(q',p')P[!(dqdpdq'dp') 



r(q,q') 



y q , ( p( q ',p') + —V p , ip ( q ',p') 



6m 2 ' 2m 



P£(dqdpdq'dp') 



[ p h k {q,p)VF{p)-V q v{q lP )dqdp + p f p%(q,p)A qi p(q,p)dqdp 

zm Jn2d urn jft 2c! 

(29), (44) r 

< C / [%' -q\ + h 2 (C h + N) + h(l + \p'\) + h 2 ] P£(dqdpdq'dp') 

< C f [h(\q\ 2 + \q'\ 2 ) + h 2 (C h +N) + h(l + \p'\ 2 )] Pt(dqdpdq'dp') 

< Ch[hW h (jil Pk) + M a (j%) + M 2 {p h k ) + 1]. 

This concludes the proof of Lemma 4.3. □ 

5 A priori estimate: Boundedness of the second moment 
and entropy 

This section includes some technical lemmas that are needed in order to prove the convergence 
result of Section 6. 

Lemma 5.1. Let {p^)k>i be the sequence of the minimizers of Scheme 2a or Scheme 2b for fixed 
h > 0. Then for any positive integer n and sufficiently small h, we have 

n n 

E mpk-nPk) < 2jh(A( Po ) - A{p h n )) + Ch 2 £ M 2 (p£) + Cnh 2 , (67) 

fc=l fc=0 

for some constant C > independent of n, where W£ is either Wh or Wh- Similarly, if {p^} and 
{Pk} are th- e sequences constructed in Scheme 2c, then 

n n 

J2 W ^k,Pk) < 2 1 h(A(p )-A(p^) + Ch 2 J2M 2 (p^ + Cnh 2 . 

fc=l fe=0 

Proof. We give the details for Scheme 2a and then comment on the differences for the other 
schemes. We first define the operator s/j : R 2d — > R 2d as the solution operator over time h for the 
Hamiltonian system 

Q' = -, P' = -W(Q), (68) 
m 

that is, Sh(q,p) is the solution at time h given the initial datum (q,p) at time zero. The operator 
Sh is bijective and volume-preserving. 

For any fixed k > 1, p\ minimizes the functional (2hj)~ 1 Wh(p k _ 1 , p)+A(p) over p S V2(R- 2d ), 
i-e-, _ _ 

W h (pti,Pk) + ^Aipl) < W h (pti,p) + 2hyA(p), (69) 

for every p e P 2 (R 2d )- In particular by taking p = (s^" 1 )j)/0^'_ 1 =: p*, for which Wh{pfc_i, pj) = 0, 
it follows that 

W h (pti,pb<^h[A(p^)-A(pt)} =2 1 h[Hpl)-HPu)}+^h[S{pl)-S{p h k )}. (70) 
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We now estimate each term on the right hand side. Write (q,p) = Sh(q,p). Using equation 
we readily estimate that the solution (Q(t), P(t)) starting at (q,p) and ending at (q,p) satisfies 
IIQIloo < C (\q\ + h\p\), and therefore 



h 

VV{Q{t))dt 



<h sup |VV(Q(t))|</i||Q||oo<Cft(|g| + /i|p|), 
te[o,h] 



so that 

F(p) F(p+ { VV(Q(t))dt) 



(28). (29c) 

< F(p) + C(\p\ + 1) 



h 

VV(Q(t))dt 



h > 
+ C\ I VV(Q(t))dt 



F(p) + Ch(\p\ + 1) (|g| + h\p\) + Ch 2 (\q\ + h\p\) 2 
< F(p)+Ch[N{q,p) + i\. 

Therefore 



Hp':)= I F(p)p>:(q,p)dqdp= [ F(p)p h k _ 1 (q,p)dqdp 



< / (F^) + ChN(q,p) + Ch)pti(m^dp<n P ti) + ChM 2 (pl_ 1 ) + Ch. (71) 
For the entropy term, we have, since is volume-preserving and bijective, 

S(p1) = r 1 I p h Aq,p)\ogp<;( q ,p)dqdp = r 1 [ pl_ 1 {s h {q,p))\ogpl_ 1 {s h {q,p))dqdp = S{p h k _ 1 ). 

jR2d J R 2d 

(72) 

From (70), (71), and (72), we obtain 

W h {jt.iA) < ^HA(pti) - A{ P D) + Ch 2 M 2 (p h k _ 1 ) + Ch 2 . 
Summing over k = 1 to n we obtain (67). 

For Scheme 2b, the equation (68) only modifies slightly, in that the acceleration becomes 
constant: 

Q' = -, P' = -V(q). 
m 

Similar estimates lead to the same result. 

For Scheme 2c, the proof is again similar, by taking p^ := p\ and estimating the difference 
-MPk) ~~ A-iPk-i) as * s done above. □ 

Lemma 5.2. There exist positive constants To, ho, and C , independent of the initial data, such 
that for any < h < ho, the solutions {p^}k>i for Scheme 2a, Scheme 2b, or Scheme 2c, satisfy 

M 2 (p h k ) < C[M 2 {po) + 1] and \S(p h k )\ < C[S( Po ) + M 2 ( P o) + l] for any k < K , (73) 

where K = \T Q /h] . 

Proof. We detail the proof for Scheme 2a; the modifications for Schemes 2b and 2c are very minor. 
For a fixed i, let Pj e Y(p l }_ 1 ,p 1 }) be the optimal plan in the definition of Wh{pi_\, p\ ). We 

have 



/ \p\ 2 pt(q,p)dqdp) 2 = ( f \p'\ 2 P, h (dqdpdq'dp') 

i 

< ( [ \P'~ p\ 2 PHdqdpdq'dp')\ 2 + ( f \p\ 2 PHdqdpdq'dp' 



21 



By (39c), we estimate 

\p' - P \ 2 P?(dqdpdq'dp') ) < CWhipl^p^i + Ch[M 2 (p^ + MaOti)*] > 



R 4d 



and hence, 



\p\ 2 P H<l,P)dqdp) < / \p\ 2 pl 1 (q,p)dqdp) +CW h (tf_ 1 ,A)$+Ch[M 2 (rt)i+M a (f$_ 1 )i 

R 2d / \JTL 2d J 

Summing over i from 1 to k we obtain 

/ \p\ 2 p h k (q,p)dqdp) 2 < Cy / W h (pl 1) p^+ChT / M 2 (pt 1 )i +( f \p\ 2 p (q,p)dqdp) ' 

k k 
< Cj2Wh(rf,p^ +ChY / M 2 ( P ^ + CM 2 (p )?. 



Therefore 



J R2d \P\ 2 P h k {q,p)dqdp < C (j2 WfcO*?> P H i) h \ + Ch2 (jl M *(p H i) h \ + CM 2 (p ) 

k k 

< CkJ^Whirf, p'l) + Ckh 2 J2 MM) + CM 2 ( Po ). (74) 



i=i 



Similarly, we use (55) and the fact that 

h o fc( m i i \ P + p'\ , h I I 



2mV3 V h 2 J 2m 

to derive that 

\q\ 2 pHq,P)dqdp\ * = (J \q'\ 2 P?{dqdpdq> 'dp')) 



/ 



m , p +p 

(q -q) 



h ^ ^ 2 



P?(dqdpdq'dp')\ + A (jf^ (d^dg>')) 



2m^ 

+ Af/ \p\ 2 P.>\dqdpdq'dp')) 2 + ( f \q\ 2 pl 1 (q,p)dqdp) 2 

1 

< ChW h (pl 1> p^+Ch[M 2 (pl 1 )i+M 2 (p':)i] + (J \q\ 2 pl 1 (q,p)dqdp S j ' . 
Summing over i from 1 to fc, we obtain 

' / \q\ 2 pUq,p)dqdp) ' < Ch^W^pU, fi)* + Ch^M 2 (p^)i + CM 2 (p ) k * 



and therefore, 



/ 7 2 \q\ 2 p'k(q,P)dqd P < Ckh 2 V W h (^_ 1} p?) + Cfc^ 2 V M 2 (pf) + CM 2 ( Po ). (75) 
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From (74) and (75) it holds that 
M 2 (4) = f (\jq\ 2 + \p\ 2 )pl{q,p)dqd P < Cfc V^^, f#) + Ckh 2 ^ M 2 (p!>) + CM 2 (p ). 



Applying Lemma 5.1 with n = k, it follows that 

k 



M 2 (p h k ) < Ck 



h(A( Po ) - A( P h k j) + Ch 2 ]T M 2 (p1) + Ckh 2 



i=0 



Ckh 2 Y,M 2 (p1) + CM 2 ( Po ) 



i=l 



< -CkhS(pl) + Ckh 2 M 2{p k i) + CM 2 ( Po ) + CkhA{ Pa ) + Ck 2 h 2 . 
i=i 

By inequality (29) in [JK098], S(p%) is bounded from below by M 2 (p£), 

S(p h k )>-C-CM 2 (d). 

Substituting (77) into (76) we have 

k 

M 2 (p h k ) < C\kh 2 *h(Pi ) + C x khM 2 (p\) + d(k 2 h 2 + 1) + CiM 2 (p ), 

i=l 

where we fix the constant C\, and use it to set the time horizon Tq: 



T ° = 4C 1 ' K ° 



To 
h 



(76) 



(77) 



(78) 



(79) 



We emphasize that C\, and hence To, is independent of the initial data. We now choose ho < To 
so small that for all h < ho we have Koh < 2Tq and C\K§h < |. Then it follows from (78) that, 
for any h < ho , k < Ko , 



\M 2 {p h k ) < C 2 kh 2 J2 M 2 (pt) + C^AT 2 + 1) + CiAf a (po). 



(80) 



Hence 



K 



K 



- M 2 (p1) < C\Klh 2 M 2 (p?) + K (T + d) + dM 2 (p ) 



i=l 
K« 



< AC 2 T 2 MiU) + K o(To + Cx) + dMM 



(81) 



Consequently, 



K 



Y^Miip*) < 2K (T + C x ) + 2CiM 2 ( /90 ) 



Substituting (82) into (80), we obtain 

2 



M 2 {pl) < - (2 + K ) (To + Ci) + C 1 M 2 {po). 
This finishes the proof of the boundedness of M 2 (p k ) . 



(82) 



(83) 
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We now show that the entropy S(pfc) is also bounded. From (77) and (83), it follows that 
S(p^) is bounded from below. It remains to find an upper bound. Applying Lemma 5.1 for n = k, 

and noting that F(p'j^) > 0, Wh(Pi_i, p%) > for all i, we have 

k k 

S(pI) < A(p ) + ChY / M 2 ( P 'n + Ckh < Ch^Mitf) + C[S(p ) + M 2 (p )] + 2CT . (84) 

i=0 i=l 

By combining with (82) we obtain the upper bound for the entropy. This completes the proof of 
the lemma. □ 



The following lemma extends Lemma 5.2 to any T > 0. The proof is the same as Lemma 5.3 
in [HuaOO], and we omit it. 

Lemma 5.3. Let {p^.}k>i be the sequence of the minimizers of Scheme 2a or Scheme 2b for fixed 
h > 0. For any T > 0, there exists a constant C > depending on T and on the initial data such 
that 

M 2 (Pk) < C, (85) 

k 

j2n(Pi-i,Pi)<ch, (86) 



R2d 



max{p%\ogp%,0}dqdp< C, 



(87) 



for any h < ho and k < Kh, where 



K h = 



For Scheme 2c the same inequalities hold, with (86) replaced by 



1=1 



6 Proof of Theorem 2.3 



In this section we bring all the parts together to prove Theorem 2.3. The structure of this 
proof is the same as that of e.g. [JK098, HuaOO], and we refer to those references for the parts 
that are very similar. The main difference lies in the convergence of the discrete Euler-Lagrangc 
equations for each of the cases to the weak formulation of the Kramers equation as h — >• 0. 

Throughout we fix T > and for each h > we set 

K h := \T/K\. 

The proof of the space-time weak compactness (36) is the same for the three schemes. Let (p^)k 
be the sequence of minimizers constructed by any of the three schemes, and let t H> p h (t) be the 
piecewise-constant interpolation (35). By Lemma 5.3 we have 

M 2 (p h (t))+ I m&x{ p h {t) log p h {t),0} dqdp < C, for all < t < T. (88) 

Since the function z i— >• max{zlogz,0} has super- linear growth, (88) guarantees that there exists 
a subsequence, denoted again by p h , and a function p € i 1 ((0, T) x R 2d ) such that 

p h -> p weakly in L 1 ((0,T) x R 2d ). (89) 

This proves (36). 
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The proof of the stronger convergence (37) and of the continuity (38) at t — follows the 
same lines as in [ JK098, HuaOO] . The main estimate is the 'equi-near-continuity' estimate 

d( P ' 1 (< 1 ),/(t 2 )) 2 <C(|t 2 -t 1 | + /i), 

where d(po,pi) is the metric generated by the quadratic cost \q — q'\ 2 + \p — p'\ 2 . This estimate 
follows from the inequality (see (39)) 

\q - qf + \P- P\ 2 < C[C* h (q,p; q',p') + h 2 N(q,p) + h 2 N(q',p')] , 

and the estimates (88) and (86); see [HuaOO, Theorem 5.2]. 

The only remaining statement of Theorem 2.3 is the characterization of the limit in terms of 
the solution of the Kramers equation, and we now describe this. 

Let p h be generated by one of the three schemes. We now prove that the limit p satisfies 
the weak version of the Kramers equation (34). Fix T > and tp G C^°((— oo,T) x R 2d ); all 
constants C below depend on the parameters of the problem, on the initial datum po, and on ip, 
but are independent of k and of h. Wc first discuss Schemes 2a and 2b. 

Let P k * € T(p k _ 1 ,p k ) t> e tne optimal plan for W^(p k _ l , p k ), where the star indicates the 
quantities associated with either Scheme 2a or Scheme 2b. For any < t < T, we have 



R 2d 



[Pk(q>P) -Pfc-l(«)P)] ( p(t,q,p)dqdp 



Pk(q' VM*><?' ',p')dq'dp' - / pl_ 1 {q 1 p)L P {t,q 1 p)dqdp 

R 2d 7 R 2d 



R 4d 



[<p(t,q',p') - <p(t,q,p)] Pt(dqdpdq'dp') 



where 



[(<?' - q) ■ V q 'V(t,q',p') + (p'-p) ■ V p ^(t,q',p')]Pl;*(dqdpdq'dp')+Sk, (90) 

R 4d 



\Sk\ < Cf [\q' -q\ 2 + \p' -p\ 2 ]P£*(dqdpdq'dp r ) 
(39) 

( < CWZ(p h k _i,Pk) + Ch 2 . (91) 



By combining (90) with (61) we find 

/ Pk(t,q,p) - p' k l -i{q,p) 



R- 



h 



ip{t 1 q,p)dqdp 



R 2d 



where 



£■ • V q <p(t, q,p) - (W(q) + 7 VF(p)) ■ V p y{t, q,p) + -y^A^i, q,p)] p h k (q,p)dqdp 

+ e k (t), (92) 



Mt)\ < ^ + Ch[WZ( P h k _ 1 ,ti) + M 2 (p h k _ 1 ) + M 2 (p h k ) + l 

(88), (91) C 

< jWM^pD+Ch. (93) 
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Note that 9 k depends on t through the i-dependence of (p. Next, from (92), for k > 1 we have 



kh 



(k-l)h JR 2d 
kh 



' PkiliP) -Pk-i(Q>P) 



ip(t, q,p)dqdpdt 



(k-l)h JR M 
kh 



V q ip(t, q, P ) - (VV(q) + 7VF(p)) • V p tp(t, q,p) + J^A^t, q,p) p h k (q,p)dqdpdt 



h(t)dt 



(k-l)h 



kh 



(k-l)h JR 2d 
kh 



-■V q ip{t,q,p) - (W(<?) + 7VF(p)) • V p <p(t,q,p)+'yj3- 1 A p tp{t,q,p)] p h {t,q,p) dqdpdt 



0k(t)dt. 

I(k-l)h 

Summing from k — 1 to Kh we obtain 

( Pk(q,p) -Pk-i(q,p) 



Kh P kh 



fe=1 J(k-l)h JR 2d 



where 



<p{t, q,p)dqdpdt 



o JR 2d 

Rh, 



£ ■ V q <p(t,q,p) - (VF(g)+ 7 VF(p)) • V p cp(t, q,p) + 7) 8- 1 A pV (t, q,p)] p h (t,q,p) dqdpdt 

(94) 



Kh ~kh 



J *h pKn 

J (k—1 



s (t)dt. 



k=1 J{k-i)h 

By a discrete integration by parts, we can rewrite the left hand side of (94) as 



h f ^(t,q,p) , 

/ Po(q,P) 7 dqdpdt 

o Jn 2d n 



From (94) and (96) we obtain 



P h (t,q,p) 



V{t,q,p) - ip(t + h,q,p) 



(95) 



dqdpdt. 

(96) 



JR' 



P h (t,q,p) 



<p(t,q,p)-<p(t + h ,q,p) 



dqdpdt 



JR 



£ • V g tp(t,q,p) - (VV(q)+yVF(p)) ■ V p <p(t,q,p) +>yp- 1 \<p(t,q,pj\ p h (t,q lP ) dqdpdt 
m J 



/ Po (Q, p) <p ( t, ' 1 ' P ^ dqdpdt + R h 

JR 2d ft 



Now Rh — > as h —> 0, since 



(97) 



(95) £\ fkh (93) ^ fkh , 1 

\Rh\ < E / i^wi* ^ C E / u 

k=l J(k-l)h fc=1 J(k-X)h \ n 



Wh*(p h k _i,p'k) + h)dt 



K h 



(86) 



Cj2[Wh(pti,P h k) + Ch 2 ] < Ch 



fe=l 



Taking the limit /i — > in (97) yields equation (34). 



2G 



R 2d 



For Scheme 2c, only (90) is different: 
[pkiliP) - Pk-i(<l,P)] f{t,q,p)dqdp 

Pk(q',p')<p{t,q',p')dq'dp' - / p h k _ 1 {q,p))ip{t,q,p)dqdp 

Pki^P')^^ ,p')dq' dp' - I fj%(q,p)cp(t,a h (q,p))dqdp 

R 2d jR 2d 



R 4d 



R 4d 



<p(jt, q',p') - <p(t, q - -h,p + WV(q)h)] P£(dqdpdq'dp') 
V m /J 

(q'-q+ ^h) ■ V q «p(t, q',p') + (p' - p - VV(q)h) ■ V p «p(t, q',p')\ P£(dqdpdq' dp') + e k , 



where 

2 

|2 \ rjh/ 



e k \ <cf U 


q - q H n 




m 



\p'-p- VV(q)h\ 2 P^dqdpdq'dp') 



with the constant C depending only on ip. Since \p' — p\ 2 , \q' — q\ 2 < CCh(q,p; q',p r ) and iVV^g)! 2 < 

|2 



C\q\' 



r 



q' - q + ~h 2 + \p' - p - hWV(q)\ 2 < 2 (j 2 \q - q'\ 2 + ^-\p\ 2 + \p - pf + h 2 \VV{q)\- 



<CC h (q,p;q',p') + Ch 2 N(q,p). 



Therefore 



|e*| < C / [C h (q,p;q',p') + h 2 N(q,p) + h 2 ] P^dqdpdq' dp') 

= CWM,pfo + CM 2 {^)h 2 + Ch 2 
<CW h ^ h k ,p h k ) + Ch 2 . 

The rest of the proof is the same. 
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